Allow the sidecar to sample from a list of prefill host ports #65

smarterclayton · 2025-09-25T17:13:27Z

In some benchmarking and test environments dynamic prefill selection may be difficult and random selection among a set of hosts is sufficient.

Add a new --enable-prefiller-sampling flag that instructs the sidecar to select a random prefill host from the provided list instead of the first one. Make the behavior opt-in to prevent users from accidentally depending on the new behavior, and keep the existing default behavior (first header value) consistent.

E.g.:

curl -H 'x-prefiller-host-port: server1:8000` -H 'x-prefiller-host-port: server2:8000'

will randomly choose one of the two values.

This allows static test environments to use multiple hardcoded hosts for testing. A load balancer may still be desirable, but I am chasing an issue where using a load balanced prefiller group does not result in the correct serving behavior.

smarterclayton · 2025-09-25T17:14:55Z

The configuration this allows is vllm bench serve to take a static x-prefiller-host-port header (via the newly added --header support) for a benchmark run that approximates round robin load balancing to DP>8 instances, without any other dependencies except a kube service to balance between the decoders.

In some benchmarking and test environments dynamic prefill selection may be difficult and random selection among a set of hosts is sufficient. Add a new `--enable-prefiller-sampling` flag that instructs the sidecar to select a random prefill host from the provided list instead of the first one. Make the behavior opt-in to prevent users from accidentally depending on the new behavior, and keep the existing default behavior (first header value) consistent. E.g.: curl -H 'x-prefiller-host-port: server1:8000` -H 'x-prefiller-host-port: server2:8000' will randomly choose one of the two values. Signed-off-by: Clayton Coleman <[email protected]>

elevran · 2025-10-29T13:20:03Z

@smarterclayton as we've moved the routing sidecar code into llm-d-inference-scheduler, could you kindly close the PR here and more move the code over to the new repo if needed?
Also note that there are lint errors (unused param in test code)

smarterclayton · 2025-10-29T16:09:26Z

Moved to llm-d/llm-d-inference-scheduler#404

smarterclayton force-pushed the enable_prefiller_sampling branch from fae0dcf to 1143dac Compare October 4, 2025 20:22

smarterclayton requested a review from lionelvillard October 13, 2025 19:51

smarterclayton force-pushed the enable_prefiller_sampling branch from 1143dac to b49f6e3 Compare October 13, 2025 20:28

smarterclayton closed this Oct 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow the sidecar to sample from a list of prefill host ports #65

Allow the sidecar to sample from a list of prefill host ports #65

Uh oh!

smarterclayton commented Sep 25, 2025 •

edited

Loading

Uh oh!

smarterclayton commented Sep 25, 2025

Uh oh!

elevran commented Oct 29, 2025

Uh oh!

smarterclayton commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Allow the sidecar to sample from a list of prefill host ports #65

Allow the sidecar to sample from a list of prefill host ports #65

Uh oh!

Conversation

smarterclayton commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smarterclayton commented Sep 25, 2025

Uh oh!

elevran commented Oct 29, 2025

Uh oh!

smarterclayton commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smarterclayton commented Sep 25, 2025 •

edited

Loading